Case study: how do features of nesting female horseshoe crabs influence the number of males found nearby?

Load the data. Here are the top six rows of 173 rows:

colour spine width weight n_male
2 3 28.3 3.05 8
3 3 26.0 2.60 4
3 3 25.6 2.15 0
4 2 21.0 1.85 0
2 3 29.0 3.00 1
1 2 25.0 2.30 3

Predictors: Colour; spine condition; carapace width; weight.

First, let’s see how carapace width influences the mean number of males nearby.

Data source: H. Jane Brockmann’s 1996 paper; found online here; another regression demo with this data is found here.

Approach 1: Estimate regression curve / model function locally

Preliminary questions

These questions are meant to check your understanding of local regression.

What is the estimated mean number of nearby males for nesting females having a carapace width of 32.5? Use the following methods, by hand.

1. kNN with \(k=3\).

2. Using a moving window with a radius of 2.4.

3. Using a kernel smoother with Gaussian kernel with variance 1.

4. Using local polynomials with a radius of 2.4 and a flat kernel, first with degree 1, then with degree 2.

Fit a smoother by eye

Optimize the loess fit by-eye. Just modify span, to keep things simple.

grid <- seq(min(crab$width), max(crab$width), length.out=100)
grid_df <- tibble(width = grid)
# FIT_MODEL_HERE
# PLOT_CURVE_HERE

What’s the error of this model? Training error is fine.

How well does this model answer our original question?

Approach 2: Linear Regression

Fit a linear regression model

Fit a linear regression model. What’s the error?

How well does this model answer our original question? Do you see a potential problem with this model? Are any assumptions of linear regression not true? Brainstorm ideas for how to deal with the problems.

LS0tCnRpdGxlOiAiQ2FyZSBhbmQgVmFsdWUgb2YgTW9kZWwgQXNzdW1wdGlvbnM6IENhc2Ugc3R1ZHkiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCmBgYHtyfQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSh0aWR5dmVyc2UpKQpgYGAKCkNhc2Ugc3R1ZHk6IGhvdyBkbyBmZWF0dXJlcyBvZiBuZXN0aW5nIGZlbWFsZSBob3JzZXNob2UgY3JhYnMgaW5mbHVlbmNlIHRoZSBudW1iZXIgb2YgbWFsZXMgZm91bmQgbmVhcmJ5PwoKTG9hZCB0aGUgZGF0YS4gSGVyZSBhcmUgdGhlIHRvcCBzaXggcm93cyBvZiAxNzMgcm93czoKCmBgYHtyfQpjcmFiIDwtIHJlYWRfdGFibGUoImh0dHBzOi8vbmV3b25saW5lY291cnNlcy5zY2llbmNlLnBzdS5lZHUvc3RhdDUwNC9zaXRlcy9vbmxpbmVjb3Vyc2VzLnNjaWVuY2UucHN1LmVkdS5zdGF0NTA0L2ZpbGVzL2xlc3NvbjA3L2NyYWIvaW5kZXgudHh0IiwgY29sX25hbWVzID0gRkFMU0UpICU+JSAKICBzZWxlY3QoLTEpICU+JSAKICBzZXROYW1lcyhjKCJjb2xvdXIiLCJzcGluZSIsIndpZHRoIiwid2VpZ2h0Iiwibl9tYWxlIikpICU+JSAKICBtdXRhdGUoY29sb3VyID0gZmFjdG9yKGNvbG91ciksCiAgICAgICAgIHNwaW5lICA9IGZhY3RvcihzcGluZSkpCmtuaXRyOjprYWJsZShoZWFkKGNyYWIpKQpgYGAKClByZWRpY3RvcnM6IENvbG91cjsgc3BpbmUgY29uZGl0aW9uOyBjYXJhcGFjZSB3aWR0aDsgd2VpZ2h0LiAKCkZpcnN0LCBsZXQncyBzZWUgaG93IGNhcmFwYWNlIHdpZHRoIGluZmx1ZW5jZXMgdGhlIG1lYW4gbnVtYmVyIG9mIG1hbGVzIG5lYXJieS4KCmBgYHtyLCBmaWcud2lkdGg9NiwgZmlnLmhlaWdodD0zfQpwIDwtIGdncGxvdChjcmFiLCBhZXMod2lkdGgsIG5fbWFsZSkpICsgCiAgZ2VvbV9wb2ludChhbHBoYT0wLjI1KSArCiAgbGFicyh4ID0gIkNhcmFwYWNlIFdpZHRoIiwgCiAgICAgICB5ID0gIk5vLiBtYWxlc1xubmVhcmJ5IikgKwogIHRoZW1lX2J3KCkgKwogIHRoZW1lKGF4aXMudGl0bGUueSA9IGVsZW1lbnRfdGV4dChhbmdsZT0wLCB2anVzdD0wLjUpKQpwbG90bHk6OmdncGxvdGx5KHApCmBgYAoKRGF0YSBzb3VyY2U6IFtILiBKYW5lIEJyb2NrbWFubidzIDE5OTYgcGFwZXJdKGh0dHBzOi8vb25saW5lbGlicmFyeS53aWxleS5jb20vZG9pL2Ficy8xMC4xMTExL2ouMTQzOS0wMzEwLjE5OTYudGIwMTA5OS54KTsgZm91bmQgb25saW5lIFtoZXJlXShodHRwczovL25ld29ubGluZWNvdXJzZXMuc2NpZW5jZS5wc3UuZWR1L3N0YXQ1MDQvc2l0ZXMvb25saW5lY291cnNlcy5zY2llbmNlLnBzdS5lZHUuc3RhdDUwNC9maWxlcy9sZXNzb24wNy9jcmFiL2luZGV4LnR4dCk7IGFub3RoZXIgcmVncmVzc2lvbiBkZW1vIHdpdGggdGhpcyBkYXRhIGlzIGZvdW5kIFtoZXJlXShodHRwczovL25ld29ubGluZWNvdXJzZXMuc2NpZW5jZS5wc3UuZWR1L3N0YXQ1MDQvbm9kZS8xNjkvKS4KCgojIyBBcHByb2FjaCAxOiBFc3RpbWF0ZSByZWdyZXNzaW9uIGN1cnZlIC8gbW9kZWwgZnVuY3Rpb24gbG9jYWxseQoKIyMjIFByZWxpbWluYXJ5IHF1ZXN0aW9ucwoKVGhlc2UgcXVlc3Rpb25zIGFyZSBtZWFudCB0byBjaGVjayB5b3VyIHVuZGVyc3RhbmRpbmcgb2YgbG9jYWwgcmVncmVzc2lvbi4KCldoYXQgaXMgdGhlIGVzdGltYXRlZCBtZWFuIG51bWJlciBvZiBuZWFyYnkgbWFsZXMgZm9yIG5lc3RpbmcgZmVtYWxlcyBoYXZpbmcgYSBjYXJhcGFjZSB3aWR0aCBvZiAzMi41PyBVc2UgdGhlIGZvbGxvd2luZyBtZXRob2RzLCBieSBoYW5kLgoKMVwuIGtOTiB3aXRoICRrPTMkLgoKYGBge3J9CgpgYGAKCjJcLiBVc2luZyBhIG1vdmluZyB3aW5kb3cgd2l0aCBhIHJhZGl1cyBvZiAyLjQuCgpgYGB7cn0KCmBgYAoKM1wuIFVzaW5nIGEga2VybmVsIHNtb290aGVyIHdpdGggR2F1c3NpYW4ga2VybmVsIHdpdGggdmFyaWFuY2UgMS4KCmBgYHtyfQoKYGBgCgo0XC4gVXNpbmcgbG9jYWwgcG9seW5vbWlhbHMgd2l0aCBhIHJhZGl1cyBvZiAyLjQgYW5kIGEgZmxhdCBrZXJuZWwsIGZpcnN0IHdpdGggZGVncmVlIDEsIHRoZW4gd2l0aCBkZWdyZWUgMi4KCmBgYHtyfQoKYGBgCgojIyMgRml0IGEgc21vb3RoZXIgYnkgZXllCgpPcHRpbWl6ZSB0aGUgbG9lc3MgZml0IGJ5LWV5ZS4gSnVzdCBtb2RpZnkgc3BhbiwgdG8ga2VlcCB0aGluZ3Mgc2ltcGxlLgoKYGBge3IsIGZpZy53aWR0aD02LCBmaWcuaGVpZ2h0PTR9CmdyaWQgPC0gc2VxKG1pbihjcmFiJHdpZHRoKSwgbWF4KGNyYWIkd2lkdGgpLCBsZW5ndGgub3V0PTEwMCkKZ3JpZF9kZiA8LSB0aWJibGUod2lkdGggPSBncmlkKQojIEZJVF9NT0RFTF9IRVJFCiMgUExPVF9DVVJWRV9IRVJFCmBgYAoKV2hhdCdzIHRoZSBlcnJvciBvZiB0aGlzIG1vZGVsPyBUcmFpbmluZyBlcnJvciBpcyBmaW5lLgoKYGBge3J9CgpgYGAKCkhvdyB3ZWxsIGRvZXMgdGhpcyBtb2RlbCBhbnN3ZXIgb3VyIG9yaWdpbmFsIHF1ZXN0aW9uPwoKIyMgQXBwcm9hY2ggMjogTGluZWFyIFJlZ3Jlc3Npb24KCiMjIyBGaXQgYSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbAoKRml0IGEgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwuIFdoYXQncyB0aGUgZXJyb3I/CgpgYGB7cn0KCmBgYAoKSG93IHdlbGwgZG9lcyB0aGlzIG1vZGVsIGFuc3dlciBvdXIgb3JpZ2luYWwgcXVlc3Rpb24/IERvIHlvdSBzZWUgYSBwb3RlbnRpYWwgcHJvYmxlbSB3aXRoIHRoaXMgbW9kZWw/IEFyZSBhbnkgYXNzdW1wdGlvbnMgb2YgbGluZWFyIHJlZ3Jlc3Npb24gbm90IHRydWU/IEJyYWluc3Rvcm0gaWRlYXMgZm9yIGhvdyB0byBkZWFsIHdpdGggdGhlIHByb2JsZW1zLgoKIyMgQXBwcm9hY2ggMzogTGluayBGdW5jdGlvbgoKRml0IGEgR0xNLiBXaGF0J3MgdGhlIGVycm9yPwoKYGBge3J9CgpgYGAKCg==